SeLeCT: a lexical cohesion based news story segmentation system
نویسندگان
چکیده
In this paper we compare the performance of three distinct approaches to lexical cohesion based text segmentation. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our approach to news story segmentation (the SeLeCT system) is based on an analysis of lexical cohesive strength between textual units using a linguistic technique called lexical chaining. We evaluate the relative performance of SeLeCT with respect to two other cohesion based segmenters: TextTiling and C99. Using a recently introduced evaluation metric WindowDiff, we contrast the segmentation accuracy of each system on both ‘spoken’ (CNN news transcripts) and ‘written’ (Reuters newswire) news story test sets extracted from the TDT1 corpus.
منابع مشابه
Spoken and Written News Story Segmentation Using Lexical Chains
In this paper we describe a novel approach to lexical chain based segmentation of broadcast news stories. Our segmentation system SeLeCT is evaluated with respect to two other lexical cohesion based segmenters TextTiling and C99. Using the Pk and WindowDiff evaluation metrics we show that SeLeCT outperforms both systems on spoken news transcripts (CNN) while the C99 algorithm performs best on t...
متن کاملSegmenting Broadcast News Streams using Lexical Chains
In this paper we propose a course-grained NLP approach to text segmentation based on the analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our sy...
متن کاملMaximum lexical cohesion for fine-grained news story segmentation
We propose a maximum lexical cohesion (MLC) approach to news story segmentation. Unlike sentence-dependent lexical methods, our approach is able to detect story boundaries at finer word/subword granularity, and thus is more suitable for speech recognition transcripts which have no sentence delimiters. The proposed segmentation goodness measure takes account of both lexical cohesion and a prior ...
متن کاملModeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation
We present a mathematically rigorous framework for modeling the statistical behavior of lexical chains for automatic story segmentation of broadcast news audio. Lexical chains were first proposed in [1] to connect related terms within a story, as an embodiment of lexical cohesion. The vocabulary within a story tends to be cohesive, while a change in the vocabulary distribution tends to signify ...
متن کاملProbabilistic Latent Semantic Analysis for Broadcast News Story Segmentation
This paper proposes to perform probabilistic latent semantic analysis (PLSA) for broadcast news (BN) story segmentation. PLSA exploits a deeper underlying relation among terms beyond their occurrences thus conceptual matching can be employed to replace literal term matching. Different from text segmentation, lexical based BN story segmentation has to be carried out over LVCSR transcripts, where...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- AI Commun.
دوره 17 شماره
صفحات -
تاریخ انتشار 2004